skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Liu, Zong-Yan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Interpreting function and fitness effects in diverse plant genomes requires transferable models. Language models (LMs) pretrained on large-scale biological sequences can capture evolutionary conservation and offer cross-species prediction better than supervised models through fine-tuning limited labeled data. We introduce PlantCaduceus, a plant DNA LM that learns evolutionary conservation patterns in 16 angiosperm genomes by modeling both DNA strands simultaneously. When fine-tuned on a small set of labeledArabidopsisdata for tasks such as predicting translation initiation/termination sites and splice donor/acceptor sites, PlantCaduceus demonstrated remarkable transferability to maize, which diverged 160 Mya. The model outperformed the best existing DNA language model by 1.45-fold in maize splice donor prediction and 7.23-fold in maize translation initiation site prediction. In variant effect prediction, PlantCaduceus showed performance comparative to state-of-the-art protein LMs. Mutations predicted to be deleterious by PlantCaduceus showed threefold lower average minor allele frequencies compared to those identified by multiple sequence alignment-based methods. Additionally, PlantCaduceus successfully identifies well-known causal variants in bothArabidopsisand maize. Overall, PlantCaduceus is a versatile DNA LM that can accelerate plant genomics and crop breeding applications. 
    more » « less
    Free, publicly-accessible full text available June 17, 2026